Comparison of parallel infill sampling criteria based on Kriging surrogate model

One of the key issues that affect the optimization effect of the efficient global optimization (EGO) algorithm is to determine the infill sampling criterion. Therefore, this paper compares the common efficient parallel infill sampling criterion. In addition, the pseudo-expected improvement (EI) criterion is introduced to minimizing the predicted (MP) criterion and the probability of improvement (PI) criterion, which helps to improve the problem of MP criterion that is easy to fall into local optimum. An adaptive distance function is proposed, which is used to avoid the concentration problem of update points and also improves the global search ability of the infill sampling criterion. Seven test problems were used to evaluate these criteria to verify the effectiveness of these methods. The results show that the pseudo method is also applicable to PI and MP criteria. The DMP and PEI criteria are the most efficient and robust. The actual engineering optimization problems can more directly show the effects of these methods. So these criteria are applied to the inverse design of RAE2822 airfoil. The results show the criterion including the MP has higher optimization efficiency.


List of symbols
www.nature.com/scientificreports/ efficiency and robustness with traditional parallel criteria. It can provide a reference for other people to choose the parallel criteria. Furthermore, in the third part, several parallel infill sampling criteria are applied to the actual inverse design of two-dimensional airfoil. Comparing the difference in the optimal speed and optimal value of different criteria and updating points, which is beneficial to provide a reference for airfoil optimization.

The efficient global optimization algorithm
Kriging model. Kriging model regards the target function as a combination of a regression model and a stochastic process. The regression model usually uses a constant. The initial samples are S = x 1 , x 2 , . . . , x n T ∈ R n * n dim . The corresponding function values are y = y 1 (x), y 2 (x), . . . , y n (x) T ∈ R n * n dim where µ is a constant, z(x) is the stochastic process assumed to have mean zero and covariance between z(x) and z(v) , where σ 2 is the process variance of the response and ℜ(x, v, θ ) is the correlation model with parameters θ . The correlation model is a function of the distance between the sample points. When the distance between the two points is small, the value of the correlation function is near one. As the distance increases, the value tends to zero. For an n-dimensional problem, the correlation function ℜ(x, v, θ ) that represents the correlation between point x(j) and point v(j) is shown by Based on the unbiased estimation E ŷ(x) − y(x) = 0 , the corresponding maximum likelihood estimate of the variance is: A detailed description can be found in 2 . Kriging predictor and the mean squared error can be obtained as follows.
In the equations, R is a matrix with entry R ij (x) = ℜ x i , x j , θ , r is a n dim -dimensional vector with entry r i (x) = ℜ(x, x i , θ) , and 1 is n dim -dimensional vector of ones.
Minimizing the prediction criterion. The premise of applying this criterion is that the surrogate model is accurate. The criterion directly searches for the minimum objective based on the surrogate models. But it is easy to fall into the local optimum trap. In this paper, to find the maximum value of the criterion uniformly, the MP criterion is negative.
Expected improvement criterion. The expected improvement is defined as the mathematical expectation of the improvement value at a predicted point 5 . The improvement can be defined as I(x) = max y min − y(x), 0 . Then the expected improvement is given by where �(x) and φ(x) are the cumulative distribution and probability density function of the standard normal distribution, respectively.
It can be seen from the formula of EI that it considers both the prediction value and the prediction error of the Kriging model. Thus it gives a balance between local search and global search.
Probability of improvement criterion. The probability of improvement is defined as the probability that the predicted point value is less than the current true optimal value 17 . In general, we want to achieve a smaller value T than the current true optimal value. The value of T is as follows: www.nature.com/scientificreports/ where κ is the improvement factor. The κ is set to 0.1 6 . The probability of improvement is described below: Pseudo expected improvement criterion. The principle is to reduce the value near the updated points by introducing influence functions. The influence function is shown in (11) 15 .
The first updated point is identified by maximizing the standard EI function. As the optimization process goes on, the q th updated point is identified as: The detailed adding process can be seen at the end of this section. Similarly, we use the same steps to apply the pseudo method to MP and PI criteria.
Distance function. Viana et al. 17 introduced a fixed distance to constrain updated points. The distance of each update point in each loop is greater than the given value. The distance is defined as 0.1 * √ n dim . n dim is the number of dimensions of the test problem. The parallel minimizing the prediction criterion with a fixed distance to constrain is referred to as the MMP criterion. The parallel minimizing the prediction criterion with a fixed distance to constrain updated points is referred to as the MMP criterion. The parallel probability of improvement with a fixed distance to constrain updated points is referred to as the MPI criterion. The parallel expected improvement with a fixed distance to constrain is referred to as the MEI criterion. The range of influence of the real updated point is not the same as the given distance. So it is not reasonable to use a fixed distance directly. An adaptive distance constraint is proposed based on the PEI. The method also uses correlation functions. When the value of the correlation function is less than the threshold δ ∈ (0, 1) , the value of the distance function is 1. Otherwise, it is 0. The distance function not only avoids the concentration of the candidate solution but also improves the global searching ability. Because the hyper-parameters are anisotropic, so the distance function is also anisotropic in all directions.
Applying this method can refer to Viana's 17 or Zhan's 15 . In this paper, the method of Zhan 15 is taken as an example. The q th updated point is identified as: www.nature.com/scientificreports/ Figure 1 shows the flowchart of the parallel updated points. The specific optimization process is as follows.
Step 1: Parameter initialization. The parameters, including design domain, boundary constraints, relative parameters of the Kriging model, number of updating points, and convergence condition are initialized.
Step 2: Obtain initial samples by LHD.
Step 3: Build a Kriging model based on the current design samples.
Step 4: Take the expected improvement (EI) criterion as an example to illustrate the process of parallel adding points.
(a) Identify the first update point by looking for the maximum EI function value using GA. (b) An approximation of the updated EI function is calculated by multiplying the initial EI function by the influence function of the point x (n+1) : www.nature.com/scientificreports/ (c) The second updating point is produced by maximizing the approximated updated EI function: (d) As the optimization process goes on, the q th updated point is identified as (13): (2) Fix distance function method.
(a) Identify the first update point by looking for the maximum EI function value.
(b) The second updating point is identified as: (c) As the optimization process goes on, the q th updated point is identified as: (3) Adaptive distance function.
(a) Identify the first update point by looking for the maximum EI function value.
(b) The second updating point is produced by maximizing the approximated updated EI function: (c) As the optimization process goes on, the q th updated point is identified as (16): Step 5: Convergence conditions. If the maximum number of function evaluations is met or the maximum value of the EI is less than 0.1%, the whole loop is terminated. If it is not met, add the sampling points after the choice to the sample set, and back to Step 3.
To make the progress of parallel adding points more intuitive, the author uses a one-dimensional function in "Demonstrative example" to compare the process of single and parallel adding points.

Numerical experiments
Demonstrative example. The Forrester function 19 y = (6x − 1) 2 sin (12x − 4), x ∈ [0, 1] is used to demonstrate the process of updating points. The initial sampling points are X = [0, 0.5, 0.7, 1] . Four updating points are given in each cycle or iteration for the parallel criterion. Four cycles are demonstrated for the standard single-point criterion. Figure 2 shows the algorithm based on the EI criterion. The updated point position of the PEI, DEI, and MEI criteria is close to the standard EI criterion. The difference is the order of adding the point. For the KB criterion, the decay of the EI value is rapid because of the change of the hyper-parameter after introducing the 'fake' point. The other three only construct the Kriging model once, so the EI function is real. Therefore, it can be found that the parallel process essentially looks for other local optimal values. DEI and MEI criteria are similar in the first cycle, but the distance function will change with the update of the Kriging model for DEI. Figure 3 shows the algorithm based on the PI criterion. The updated points of the PI criterion are almost consistent with the EI criterion, which shows that the PI criterion is also an effective infilling samples method. And the updated points of MPI, PPI, and DPI criteria are also similar to the PEI, DEI, and MEI criteria. Therefore, the parallel probability of improvement (PI) criterion can obtain effective updated points. Figure 4 shows the algorithm based on the MP criterion. The standard MP criterion has the poor global searching ability. Therefore, the candidate points are concentrated near the current optimal value. Although the optimal solution can be found when the surrogate model is accurate, it is not conducive to improving the prediction accuracy of the model. After the parallel criterion is introduced, the points near the candidate points cannot be added so that the search expands outward. It is conducive to finding local optimal values and improving the global searching ability.
Through the demonstration of a one-dimensional function, the parallel criteria proposed in this paper can effectively find update points after being introduced into the PI criterion. At the same time, the MP criterion can be prevented from falling into the local optimum.
Test problems. Seven test problems are used for this experiment and described below.
When the relative error of the optimal value is less than 0.01, we believe that convergence has been achieved. For the first four test problems, the maximum number of cycles is 100. The number of cycles is counted when optimization is done. The distance function threshold is set to 0.1. For the 5th to 7th test problems, because it is difficult to reach the optimal value for the high dimensional problems using 100 iterations. The maximum number of cycles is set to 20, 40, 60, 80, corresponding to the process of adding 2, 4, 6, 8, and 10 update points. The optimal value is counted, when the number of cycles reaches. The threshold value of the distance function is set to 0.5. Figures 5, 6, and 7 show the boxplot of two and three-dimensional test problems. The median and interquartile range of DEI, PEI, and KB criteria are close. The median and interquartile range of DEI is larger. The number of cycles decreases with the updated point increasing. But the rate of decrease is slowing down, which indicates that the optimization rate cannot increase linearly as the number of update points increases. KB criterion has a lot of outliers when the updated point is great than two for the Branin function, which indicates that KB may be invalid when the number of the parallel updated points added is large. For the two-dimensional test function, the PMP criterion converges faster when the updated points are two. As the number of updated points increases, the number of iterations does not change much, indicating that the parallel infill sampling process www.nature.com/scientificreports/ does not accelerate. The MMP criterion has the problem of falling into the local optima. DMP criterion improves this problem and is better than the result of using the EI parallel criterion. The optimization efficiency of the DMP criterion is increased by 20% compared with the PEI criterion in the two updating points process. As the number of points increases, the efficiency improvement is no longer obvious The law of the parallel method including the PI criterion is similar to that of the EI criterion. But the optimization converges speed is low. The convergence value is scattered and a large number of abnormal points appear. For the three-dimensional test function, the medians of PEI, PPI, KB, and MPI methods all have a significant improvement, and the distribution of interquartile range is concentrated. Figure 8 shows the test result of the Hartman6 function. Although the median is low, the interquartile range of DEI, PEI, KB, PPI, DPI, and MMP criteria are scattered. The main reason is that these criteria always converge near the optimal value. It can also be seen from the inverse design of airfoil. The DMP and PMP criteria have a high median and a concentrated interquartile range indicating that these criteria are more robust. Figure 9 shows the test result of the StyblinskiTang7 function. With the increase of the number of updated points, the median of the optimal value is increased for DEI, PEI, DPI, MPI, PPI, and PMP criteria, which shows that the efficiency of parallel methods is reduced. The median of the optimal value of the KB, DMP, and MMP criteria is almost constant. The optimal values DMP criteria are low.
The test result of the Rosenbrock10 function is shown in Fig. 10. When the updated points are two, DMP, PMP, DPI, MPI and PPI criteria have higher efficiency. As the updated points increase, the efficiency is gradually reduced. PMP criterion even traps in the local optimal solution. DEI, PEI, and KB criteria have a similar median. DMP, PPI, and MPI criteria have a smaller median.
The test result of the Rastrigin12 function is shown in Fig. 11. It can be found that the optimal values obtained by DMP, PEI, and KB criteria are lower. And the interquartile range is narrower, but there are some outliers. The efficiency of parallel criteria for the high dimensional problems is decreased. KB criterion has a lower median and fewer outliers. www.nature.com/scientificreports/ The results of test functions show that the adaptive distance function proposed in this paper effectively improves the problem that the traditional distance function is easy to fall into local optima. At the same time, after it is introduced into the MP criterion, its optimization rate is also optimal.
After the pseudo method is introduced into the MP criterion, although the problem of falling into the local optima is improved, the optimization rate is relatively slow. After the introduction of the PI criterion, the optimization rate is similar. The optimization rate of the DEI method for low-dimensional problems is slower, but it is not easy to fall into the local optima. The optimization rate of the DMP method is improved, and at the same time, it does not fall into the local optima.

Application in the inverse design of airfoil
Demonstrative example. The efficient global optimization (EGO) algorithm based on the Kriging model is widely used in airfoil design and optimization 21,22 . In this paper, the difference between the pressure distribution of the designed airfoil and the target pressure distribution is taken as the objective function. And the inverse design problem is transformed into the optimization problem.
where the C p is the pressure coefficient of the target airfoil and Ĉ p is the pressure coefficient of the designed airfoil. N is the number of points on the airfoil surface.
The inverse design of the airfoil is performed using the CST method [23][24][25] . We first give a "thick" shape and a "thin" shape and then fit the two shapes by CST. We consider the CST parameters of the "thin" shape as the lower bound of the design variables and the parameters of the "thick" shape as the upper bound. The target airfoil should be included in the design space. The design space is shown in Fig. 12. The fourth-order CST method is used, and the total number of variables is 10. It can be seen from Fig. 13 that the RAE2822 airfoil can be reproduced.  Figure 15 shows the numerical results and the experimental results of the pressure distribution on the airfoil surface. The numerical results are consistent with the experimental results. So the numerical method used in this paper has a high-level accuracy.
The initial samples are generated by LHS, and the total number of samples is 40. First, three single point criteria, EI, MP, and PI, are tested. The maximum number of cycles is set to 300. Later, parallel infill sampling criteria, KB, PEI, DMP, PMP, and MPI, were tested. The maximum number of cycles is set to 300, 200, 100 corresponding to adding 2, 3, and 6 updated points. A combination of the EI and MP method is also tested. Figure 16 shows the result of the single updated point. The convergence curve of the MP criterion continues to decrease. It converges to 0 after 120 cycles. The convergence rate of MP in the early stage is slower than the others. The convergence rate of the PI is the fastest. EI and PI criteria only converge to about 0.3. Figure 17 shows the result of two updated points. EI + MP, MPI, and DMP criteria have a higher convergence rate. The convergence rate of the PMP criterion is gradually slowed down, but it can also converge to zero in the end. The convergence rate and optimal value of the KB and PEI criteria are similar. The optimal value is about 0.3. Figures 18 and 19 show the result of three and six updated points. KB has the fastest convergence rate, but only converges to about 0.4. As the number of updated points increases, the convergence rate of PMP and DMP criteria decreases. The optimal value still converges to 0. The convergence rate of PEI in the early stage decreases. The overall efficiency increases.
The results of the inverse design of the airfoil show that the DMP criterion also shows a better convergence rate and a smaller convergence value. The MPI method has a higher convergence rate in the early stage and the www.nature.com/scientificreports/ optimal value is higher. Therefore, the MPI criterion can be used in the early stage to accelerate the convergence speed in practical applications and later adopt the parallel method including MP criterion to obtain the optimal value.

Conclusions
In this study, the "pseudo" method is applied to the MP criterion and PI criterion. An adaptive distance function is proposed. Test problems were used to evaluate these criteria to verify the effectiveness of these methods. Some parallel criteria are used in the inverse design of an airfoil to compare the differences between these methods in the optimization speed and the optimal value, which can provide a reference for actual use selection. After the pseudo method is introduced into the MP criterion, although the problem of falling into the local optima is improved, the optimization rate is relatively slow. After the introduction of the PI criterion, the optimization rate is close to the PEI criterion. The optimization rate of the DEI method for low-dimensional problems is slower, but it is not easy to fall into the local optima. The optimization rate of the DMP method is improved, and at the same time, it does not fall into the local optima. Through the comparison of various parallel criteria, the DMP, DMP, PPI, PEI, and KB methods have better convergence speed and robustness.
The results of the inverse design of airfoil show that the parallel methods including the MP criterion have the best optimization value The convergence rate of the MPI is faster than the PEI and the KB, but the optimization values are close. Moreover, the MPI criterion can be used in the early stage to accelerate the convergence speed and later adopt the parallel method including the MP criterion to obtain the optimal value.